Goto

Collaborating Authors

 kernel test


A Linear-Time Kernel Goodness-of-Fit Test

Neural Information Processing Systems

We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein's method, meaning that it is not necessary to compute the normalising constant of the model. We analyse the asymptotic Bahadur efficiency of the new test, and prove that under a mean-shift alternative, our test always has greater relative efficiency than a previous linear-time kernel test, regardless of the choice of parameters for that test. In experiments, the performance of our method exceeds that of the earlier linear-time test, and matches or exceeds the power of a quadratic-time kernel test. In high dimensions and where model structure may be exploited, our goodness of fit test performs far better than a quadratic-time two-sample test based on the Maximum Mean Discrepancy, with samples drawn from the model.


A Kernel Test for Three-Variable Interactions

Neural Information Processing Systems

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.


A Linear-Time Kernel Goodness-of-Fit Test

Neural Information Processing Systems

We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein's method, meaning that it is not necessary to compute the normalising constant of the model. We analyse the asymptotic Bahadur efficiency of the new test, and prove that under a mean-shift alternative, our test always has greater relative efficiency than a previous linear-time kernel test, regardless of the choice of parameters for that test. In experiments, the performance of our method exceeds that of the earlier linear-time test, and matches or exceeds the power of a quadratic-time kernel test. In high dimensions and where model structure may be exploited, our goodness of fit test performs far better than a quadratic-time two-sample test based on the Maximum Mean Discrepancy, with samples drawn from the model.


A Kernel Test for Three-Variable Interactions

Neural Information Processing Systems

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.


Appendix A kernel test for quasi independence

Neural Information Processing Systems

Appendix A: Preliminary results Appendix B: Proofs of sections 2 and 3 Appendix C: Proof of Theorem 4.1 (null distribution) Appendix D: Proof of Theorem 4.2 (consistency under alternatives) Appendix E: Efficient wild bootstrap implementation Appendix F: Review of related quasi-independence tests Appendix G: Additional experiments and discussion The following Proposition is an intermediate result, which is need to prove Lemmas C.3 and D.1. Let us consider the statement i). B.1 Proof of Proposition 2.1 Proof: From Equation (4), we have Ψ Before proving Theorem 4.1 we give some essential definitions which will be used by our proofs. Assume that K is bounded. The previous result, together with Lemma C.1, allow us to deduce Ψ C.2 Proof of Lemma C.1 In order to prove Lemma of C.1, we require some intermediate results.


Efficient Aggregated Kernel Tests using Incomplete U-statistics

Neural Information Processing Systems

This procedure provides a solution to the fundamental kernel selection problem as we can aggregate a large number of kernels with several bandwidths without incurring a significant loss of test power.


New Methods for Boosting in Machine Learning part3

#artificialintelligence

Abstract: The Nyström method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nyström approximation, ensemble Nyström algorithms compute a mixture of Nyström approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nyström, which iteratively generate multiple weak'' Nyström approximations (each using a small number of columns) in a sequence adaptively -- each approximation aims to compensate for the weaknesses of its predecessor -- and then combine them to form one strong approximation. We demonstrate that our boosting Nyström algorithms can yield more efficient and accurate low-rank approximations to kernel matrices. Improvements over the standard and ensemble Nyström methods are illustrated by simulation studies and real-world data analysis.


Boosting the Power of Kernel Two-Sample Tests

Chatterjee, Anirban, Bhattacharya, Bhaswar B.

arXiv.org Machine Learning

The kernel two-sample test based on the maximum mean discrepancy (MMD) is one of the most popular methods for detecting differences between two distributions over general metric spaces. In this paper we propose a method to boost the power of the kernel test by combining MMD estimates over multiple kernels using their Mahalanobis distance. We derive the asymptotic null distribution of the proposed test statistic and use a multiplier bootstrap approach to efficiently compute the rejection region. The resulting test is universally consistent and, since it is obtained by aggregating over a collection of kernels/bandwidths, is more powerful in detecting a wide range of alternatives in finite samples. We also derive the distribution of the test statistic for both fixed and local contiguous alternatives. The latter, in particular, implies that the proposed test is statistically efficient, that is, it has non-trivial asymptotic (Pitman) efficiency. Extensive numerical experiments are performed on both synthetic and real-world datasets to illustrate the efficacy of the proposed method over single kernel tests. Our asymptotic results rely on deriving the joint distribution of MMD estimates using the framework of multiple stochastic integrals, which is more broadly useful, specifically, in understanding the efficiency properties of recently proposed adaptive MMD tests based on kernel aggregation.


Discussion of `Multiscale Fisher's Independence Test for Multivariate Dependence'

Schrab, Antonin, Jitkrittum, Wittawat, Szabó, Zoltán, Sejdinovic, Dino, Gretton, Arthur

arXiv.org Machine Learning

We discuss how MultiFIT, the Multiscale Fisher's Independence Test for Multivariate Dependence proposed by Gorsky and Ma (2022), compares to existing linear-time kernel tests based on the Hilbert-Schmidt independence criterion (HSIC). We highlight the fact that the levels of the kernel tests at any finite sample size can be controlled exactly, as it is the case with the level of MultiFIT. In our experiments, we observe some of the performance limitations of MultiFIT in terms of test power.


A Kernel Test for Three-Variable Interactions

Sejdinovic, Dino, Gretton, Arthur, Bergsma, Wicher

Neural Information Processing Systems

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures. Papers published at the Neural Information Processing Systems Conference.